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Motivated by the statistical evaluation of complex computer mod- 
els, we deal with the issue of objective prior specification for the pa- 
rameters of Gaussian processes. In particular, we derive the Jeffreys- 
rule, independence Jeffreys and reference priors for this situation, 
and prove that the resulting posterior distributions are proper under 
a quite general set of conditions. A proper flat prior strategy, based on 
maximum likelihood estimates, is also considered, and all priors are 
then compared on the grounds of the frequentist properties of the en- 
suing Bayesian procedures. Computational issues are also addressed 
in the paper, and we illustrate the proposed solutions by means of an 
example taken from the field of complex computer model validation. 

1. Introduction. In this paper we address the problem of specifying ob- 
jective priors for the parameters of general Gaussian processes. We derive 
formulas for the Jeffreys-rule prior, for an independence Jeffreys prior and 
for a reference prior on the parameters involved in the parametric speci- 
fication of the mean and covariance functions of a Gaussian process. The 
mean is assumed to be a g-dimensional linear model on location-dependent 
covariates, and the correlation function involves an r-dimensional vector of 
unknown parameters. The resulting posteriors are shown to be proper un- 
der a more restrictive scenario. We also address computational issues, and 
in particular we devise a sampling scheme to draw from the resulting pos- 
teriors that requires very little input from the user. A method aimed at 
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producing proper "diffuse" priors, based on maximum likelihood estimates, 
is also described, and we present the results of a simulation study designed 
to compare the frequentist properties of all the Bayesian methods described 
in the paper. An example based on real data is used to illustrate some of 
the proposed solutions. 

The motivation for considering the problems addressed in this paper is 
both theoretical and practical. From the applied point of view, these re- 
sults are of interest in general for the field of spatial statistics, but espe- 
cially for the analysis and validation of complex computer models. Indeed, 
one prominent approach to this problem involves fitting a Gaussian pro- 
cess to the computer model output, and a separable correlation function in- 
volving several parameters, typically a multidimensional power exponential, 
is frequently assumed — see, for example, Sacks, Welch, Mitchell and Wynn 
(1989), Kennedy and O'Hagan (2000, 2001) and Bayarri et al. (2002). The 
computer models are often computationally extremely demanding and this 
is a way of providing a cheaper surrogate that can be used in the design of 
computer experiments, optimization problems, prediction and uncertainty 
analysis, calibration and validation of the model. In the Bayesian approach, 
one must specify prior distributions for the parameters of the Gaussian pro- 
cesses. Typically, little or no prior information about the parameters is avail- 
able, and their interpretation is not always straightforward, so that auto- 
matic or default procedures are sought. The need for default specification 
of priors, and also for the development of computational schemes for the 
resulting posteriors, is thus considerable in this area. 

From a more theoretical perspective, this article is also relevant. Berger, De Oliveira and Sanso 
(2001) consider objective Bayesian analysis of Gaussian spatial processes 
with a quite general correlation structure, but the study is restricted to the 
situation where only the (one-dimensional) range parameter is considered 
to be unknown. The original motivation for their paper was the observation 
that commonly prescribed default priors could fail to yield proper poste- 
riors. A more in-depth study of the problem revealed very interesting and 
unusual facts. In the presence of an unknown mean level for the Gaussian 
process, the integrated likelihood for the parameters governing the corre- 
lation structure is typically bounded away from zero, which explains the 
difficulty with posterior propriety. Also, the independence Jeffreys prior (as- 
suming the parameters in the mean level are a priori independent of the 
ones involved in the covariance), which is often prescribed, fails to yield 
a proper posterior when the mean function includes an unknown constant 
level. The usual algorithm for the reference prior of Bernardo (1979) and 
Berger and Bernardo (1992), in which asymptotic marginalization is used, 
also fails to produce a proper posterior — this is the first known example in 
which exact marginalization is required to achieve posterior propriety. The 
authors end up recommending this "exact" reference prior. 
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The next logical step is to investigate whether these results still hold in 
higher dimensions, which is precisely one of the aims of the present article. 
The answer is no, in the sense that there are no surprises in terms of posterior 
propriety, and we describe in detail the reasons for that. 

The paper is organized as follows. Section 2 sets up some notation and 
establishes formulas for the objective priors that are valid in a very general 
setting. The next section begins by describing the scenario where analytical 
results have been achieved, and in the sequel we analyze the behavior of the 
integrated likelihood and of the priors. Finally, conditions ensuring posterior 
propriety are determined. 

Section 4 addresses computational aspects of the problem, and in particu- 
lar we describe a Markov chain algorithm to sample from the posterior that 
requires very little input from the user. This algorithm involves computing 
maximum likelihood estimates and the Fisher information matrix. Taking 
advantage of the availability of these quantities, we propose also an empir- 
ical Bayes approach to the problem that aims at reproducing the practice 
of placing proper flat priors on the parameters. This section ends with an 
example, using real data, that illustrates some of the proposed solutions. 

Section 5 presents the results of a simulation study designed to compare 
the frequentist properties of the Bayesian procedures proposed in the paper, 
and ends with some final recommendations. 

In the Appendix we present the proofs of the various results that are 
described in the body of the paper. 

2. Notation and the objective priors. Let us consider the following rather 
general situation: Y(-) is a Gaussian process on S C MP with mean and co- 
variance functions given, respectively, by 

EY(x) = ¥(x)'0 and Cov (Y(x), Y(x*)) = <r 2 c(x, x*|£), 

where c(-,-) is the correlation function, a 1 is the variance and £ is an 
r-dimensional vector of unknown positive parameters. The vector *&(x) = 
(^i(x), . . . ,ifjq(x))' is a ^-vector of location-dependent covariates. Define 

The stochastic process Y(-) is observed at locations S = {xi, . . . ,x n } and 
hence the resulting random vector, Y = (Y(xi), . . . , Y(x n ))', satisfies 

(2.1) Y|T7~ N(X0,o- 2 £) 
where £ = = [c(xj,Xj|£)]ij and 

(2.2) X = (*(x 1 )'-..*(x n ) / ) / . 

As a result, the associated likelihood function based on the observed data y 
is given by 

(2.3) L(r 7 |y)a( ( r 2 )- n / 2 |Sr 1 /2exp|-^(y-X0) / E- 1 (y-X0)}. 
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The priors on r\ that we will consider are of the form 



(2.4) vr(r/) oc 



(a 2 ) a 



for different 7r(£) and a. Indeed, we will in the next two propositions show 
that this is the case for the Jeffreys-rule prior, for an independence Jeffreys 
prior and for a reference prior. This general prior follows a familiar form: flat 
on the parameters that specify the mean and usual forms for the variance a 2 . 

Following Berger, De Oliveira and Sanso (2001), in order to derive the 
reference prior we specify the parameter of interest to be (a 2 , £) and consider 
9 to be the nuisance parameter. This corresponds in the reference prior 
algorithm to factoring the prior as 

and selecting 7r R (0\o~ 2 ,£) oc 1, because that is the Jeffreys-rule prior for the 
model at hand when a 2 and £ are considered known. Next, ir R (a 2 , £) is calcu- 
lated as the reference prior but for the marginal experiment defined by the in- 
tegrated likelihood with respect to n R (Q). It is this marginalization step that 
usually is carried out in an asymptotic fashion; Berger, De Oliveira and Sanso 
(2001) recommend that, for statistical models with a complicated covariance 
structure, the exact marginalization should become standard practice. 

■ k 

Before stating the formula for this reference prior, let us define S as 
the matrix that results from X by differentiating each of its components 
with respect to the /cth component of Also, note that the integrated 
likelihood with respect to tt r {6) oc 1 is in this case 

LV,£|y)= / L( V \y)7r R (0)de 

(2-5) 

oc((J 2 ) -(n- g )/2 |5]| -l/2 |x , s -l x| -l/2 exp |_0| ) 



where S| = y'Qy, Q = £ X P and P = I-X^'S^X^X'ST 1 . Berger, De Oliveira and Sanso 
(2001), resorting to a result by Harville (1974), point out that there is a par- 
ticular transformation of the data which has sampling distribution propor- 
tional to (2.5), and hence it is legitimate to compute the associated Jeffreys- 
rule prior. 

Proposition 2.1. The reference prior tt r (t]) is of the form (2.4) with 
(2.6) a = l and tt r (£) oc |/*(£)| 1/2 , 
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where, with 



(2-7) I fl (0 



± k Q, k = l, 



,r, 



/ n 



V 



q trWi trW 2 
trW 2 trWiW 2 



trW r 
trWiW r 

trW 2 / 



For the proof see Appendix A. 0.2. 

In the next proposition, we present the formulas for the two Jeffreys-type 
priors we will consider. 

Proposition 2.2. The independence Jeffreys prior, 7r J1 , obtained by 
assuming 6 and (cr 2 ,£) a priori independent, and the Jeffreys-rule prior, 



TT 



J2 



are of the form (2.4) with, respectively, 

a=l and tt j1 (£) oc |/j(£)| 1/2 



(2.8) 
and 
(2.9) 

where Ufc 
(2.10) 



1 + | and Tr^oclX'E^Xp/V 71 ^), 



i^S" 1 and 



(n trUi trU 2 
trUf trUiU 2 



V 



trU r 
trUiU r 

trU 2 J 



For the proof see Appendix A. 0.2. 

For priors of the form (2.4), it is possible to integrate explicitly the product 
of the likelihood and the prior over (a 2 , 6). Indeed, standard calculations 
yield [as long as a > 1 — (n — q)/2] 



L(7 7 |y)7r(7 7 )^d C T 2 = L / ^|y)7r^), 



where 
(2.11) 



L 7 (£|y) oc lEl-^IX'E-^r 1 / 2 ^!)-^-^ - 1 ). 

It is clear that the posterior associated with the prior (2.4) is proper if and 
only if < J*q L 1 |y)vr(^) d£ < oo, where C W is the parametric space 
oft 

It appears to be difficult to study analytically the properties of both the 
integrated likelihood Z/(£|y) and the function 7r(£) — which we will often 
refer to, in an abuse of terminology, as the marginal prior for £ — in such a 
general setting. In the next section we will make the problem more amenable 
to analytical treatment by introducing a number of assumptions defining a 
more restrictive scenario. 
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3. Posterior propriety. In this section we show that the objective pri- 
ors yield proper posteriors for an important special case. The proof is con- 
structed as follows. In Section 3.1 we introduce the notion of a separable 
correlation function and the restrictions we assume on the linear model 
specifying the mean function. Next, under added assumptions we describe 
the behavior of the integrated likelihood, whereas in Section 3.3 we achieve 
the same goal but for each of the objective priors determined in Section 2. 
Putting these results together, we are able to prove the theorem stated in 
Section 3.4. 

3.1. Separable correlation functions. Let p > 2 and consider the partition 
of a general element of x 6 S C MP given by x = (xi , X2 , . . . , x r ) , where r <p 
and each subvector x^, k = 1, . . . , r, is of dimension p k with Y7k=iPk = P- 

Denote by d(x, y) the Euclidean distance between two vectors. It is a sim- 
ple consequence of Bochner's theorem [cf. Cressie (1993)] that if Cfc(xfc,x£) = 
Cfc(d(xfc,x£)), k = l,...,r, are isotropic correlation functions in MP k , then 
c(x,x*) = ni-=i c fc(^( x fc) x fc)) is a valid correlation function in W. Such cor- 
relation functions are called separable, or to be more precise, partially sep- 
arable (the fully separable case corresponds to the choice r = p, p k = 1). 

If a separable correlation function is used and if furthermore the set of 
locations at which the process is observed forms a suitable Cartesian prod- 
uct, it is easy to see that the correlation matrix of the data is the Kronecker 
product of the individual correlation matrices associated with each dimen- 
sion Xfc. (Recall that the Kronecker product of two matrices, A = [aij] and 
B, is defined by A (g) B = [aij B].) It is essentially in this setting that we 
will study the analytical properties of the integrated likelihood and priors, 
along with establishing sufficient conditions for posterior propriety. We next 
make these statements precise. 

We will henceforth assume that the following conditions hold. 

Assumption Al. Separability of the correlation function: with x = 
(xi,X2, . . . ,x r ), where p>r>2, 

r 

c(x,x*|^) = n^-xj);&), 

k=l 

where p is a valid (isotropic) correlation function. 

Assumption A2. Cartesian product of the design set: 

S = S\ x £ 2 x • • • x S r , 
where S k = {x ljfc , . . . ,x nfe)fe } Cl n , #S k = n k , so that #S = n = ]\ r k=1 n k . 
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Assumption A3. Mean structure: q = 1 and 

X = X! ® x 2 ® • • • ® x r 

with each X^ of dimension mxl. 

A few remarks are in order. First, one should think about the parameter 
as representing the range parameter of the associated correlation structure; 
if other parameters are present, for example, roughness parameters, those 
will be assumed known. 

In terms of the mean structure, within the framework defined by As- 
sumption .A3 we can find, for instance, the unknown constant mean case, 
that is, \P(x) = 1 so that X = l n = l ni <g> ■ ■ ■ <g> l n . This is in turn an in- 
stance of the more general situation described by *(x) = YlkGA^kfak) where 
A C {1, . . . , n}. In this circumstance, X& = (^(xjfe), i = 1, . . . , n^)' if k 6 A; 
otherwise X fc = l nfc . 

As we alluded to before, Assumptions Al and A2 jointly allow for a very 
convenient Kronecker product expression for the correlation matrix of the 
data. To make that clear and to set up some notation as well, we have 



where E fc = [p(d(yL ik - Xjk);!ik)]i,j=i,...,n k , k = l,...,r, are the correlation 
matrices associated with each of the separated dimensions x^. Note that 
each of these matrices is of dimension n^xn^. This fact is further explored in 
Section 4. For simplicity we will from here on use the shorthand exemplified 
above: we represent by ®£ =1 the matrix Ai • • • ® A r . 

The p functions we will consider are quite general, but we will focus 
particular attention on the families of correlation functions that we list in 
Table 1 above for future reference. For more details and additional references, 
consult Cressie (1993). 

3.2. Behavior of the integrated likelihood. In this section we will study 
properties of the integrated likelihood. Nevertheless, the next two lemmas 
are key also to the series of results that will follow concerning the analytical 
behavior of the priors. 

Lemma 3.1. Under Assumptions A1-A3, we have 



r 



(3.1) 



S = Si®S 2 ®---8S r = (g)S 



k=l 



r r 



(3.2) 



Q = S - 1 P = (g)S^ 1 -(g)* fc 



k=l k=l 



where & k = S^X^X^ ^J-^E 




and 



r 



(3.3) 



X'E^XI = I] X' fe E^X fc . 



k=l 
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Table 1 
Families of correlation functions 

Spherical — 

p(d; = [1 - § d£ + \{dif]I{di < 1}; £ > 0. 
Power exponential — 

/5 (d;C)=exph(dC) a ]; C>0, a6(0,2]. 
Rational quadratic — 

p(d;£) = [l + (d£) 2 r Q ; £>0, «>0. 
Matern — tC a (-) is the modified Bessel function of the second kind and order a: 

^ denotes a range parameter; a refers to a roughness parameter; d represents the 
Euclidean distance between two locations. 



For the proof see Appendix A. 0.4. 

Lemma 3.2. Under Assumptions .A1-.43, suppose p(d;£) is a continu- 
ous function of £ > for every d > such that: 

(i) p(e£; £) = p°(d£), where p°(-) is a correlation function satisfying Hindoo p°(u) = 0; 

(ii) as — ► 0, each of the correlation matrices Xl^ satisfies 

(3.4) E fc = l Bfc i; fc +i/(&)(D fc + o(l)), fc = l,...,r, 

for some continuous nonnegative function u(-) and fixed nonsingular matrix 

(iii) above, and for k = 1, . . . , r, should satisfy 1' D fe 1 l n)t 7^ 0, Xj.D fc 1 Xj. 7^ 
and, if Xfc 7^ 1 , 

(3.5) l^D^l nfc ^ l^D^^CXiD^X*)"^* 1 ^- 

Define the quantities 



(3.6) 



D _1 1 1' D _1 

(v n^i ) 2 ' 

D _1 i i' r> _1 
k J-nr, J-n, - L -'L 



( 3 - 7 ) G fc- D fc ^ ^.-1-, . 

1 n k 1J k L n k 

(3-8) H fc = A fe . 

In i/tis setting we have that, as — ► 0, fe = 1, . . . , r, 

(3.9) I],T 1 = G fc (l + o(l))/z,(e fc ), 

(3.10) iSfcHK^r^iDfeKi^Dfeinja+oa)), 
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(3.12) 



( l + o(l), */X fc = l, 



l'Dr^X.XlDr 1 ! 



l'Dr 1 ! 



(i+ (i))/i/(e*), 



otherwise, 



F fc (l + o(l)), 
H fc (l + (1))M&), 



z/X fc = 1, 
otherwise. 



For the proof see Appendix A. 0.4. 

The following result describes the behavior of the integrated likelihood 
(2.11) in the present setting and under the assumptions of the previous 
lemma. 



Proposition 3.3. Under the conditions of Lemma 3.2, and assuming 
a prior of the form (2.4), L 1 (£\y) is a continuous function of £ given by the 
expression 

(3.13) z/(£|y) oc nflEfcl-^CX'fcE^Xfc)- 1 / 2 } * [y'Qyr (n - 3+2a)/2 , 

fc=l 

where Q is given by (3.2) and n^) = Y\i^ k n i- 

Additionally, let A = {k £ {1, . . . ,r} : X& / 1} and 0/Bc {l,...,r}. 
Denote by E the complement of a set E. Then, 

(a) as £fc — ► 0, k £ B and — ► oo, k G B, 

(3.14) L j ^i y ) oc n Ka) (nfe - 3+2a)/2 n k^) i/2 (i+9(0) 

fceB fceAnB 

mi/j g(£) — ► 0; 

(b) as £ fc — > 0, k £ B and Q < 5 k < ^ k < M k < oo, keB, 

l 1 ^) oc n Ke fc ) (nfc - 3+2a)/2 n ^ 1/2 

fcgB fceAaB 

(3.15) 

x(l + 3(fe^B}))x/i(K^£B}), 
wii/i g(-) — > and /i(-) continuous on {5 k < £fc < M k ,k S £?}. 

For the proof see Appendix A. 0.4. 



3.3. Behavior of the priors. In the next two results, properties of both 
Jeffreys priors and of the reference prior are described. They are based 
on a series of assumptions that we list below and form (except for As- 
sumption .48) a subset of those introduced in Berger, De Oliveira and Sanso 
(2001) in their study of the asymptotic properties of the priors. According 
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to this paper, the families of correlation functions listed in Table 1 (virtu- 
ally always) satisfy these properties [e.g., beware of the fact that for the 
power exponential with a = 2 and more than three equally spaced points, it 
is not the case that D is invertible. It is if a € (0,2)]. Note also how these 
assumptions imply those of Lemma 3.2. Below we denote by Sfc the matrix 
of derivatives of with respect to k = 1, . . . ,r. 

Assumptions. Suppose that p(d;^) = p°(<i£), where p°(-) is a correlation 
function satisfying lim^^oo p°(u) = 0, is a continuous function of £ > for 
any d > and that, for k = 1, . . . , r, 

Assumption A4. As — > 0, there are fixed matrices D^, nonsingular, 
and Dt; differentiable functions v{-) (> 0) and u>(-); and a matrix function 
Rfc(-) that is differentiable as well, such that 

(3.16) £ fc = l nk l' nk + f + w(&)D£ + Rfc(&) 5 

(3.17) E fe = + (£ fc )D£ + J-R*(&). 

Assumption ^45. The functions u and R& further satisfy, as — > 
and with ||[ay]||oo = maxjj{|ajj |}, 

iiR*(efc)iioo ii(a/(^))Rfc(^)iioo ;0 



K&) ' Ah) 

Assumption A6. Above D fc satisfies l^D^l^ / 0, X' fc D7; 1 X fc / 
and, if X fc / 1, 

infeDfc^nfc / infcDfc^fc^feDfe 1 ^) - ^^^ 1 !^. 

Assumption A7. [tr(£ fe ) 2 ] 1//2 , as a function of is integrable at in- 
finity. 

Assumption .48. n k > 2. 

We start with the reference prior and then proceed to the Jeffreys-type 
priors. 

Proposition 3.4. For the reference prior (2.6), and under Assump- 
tions .41-.43, there are functions vr^(^fc), k = 1, . . . ,r, such that 

(3.i8) n 

fc=l 
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where, as — > 0, k = 1, . . . ,r, and under the added Assumptions .A4-.48, 



(3.19) tt£(&) oc 

and 7r^ is integrable at infinity. 



Ah 



For the proof see Appendix A. 0.5. 

Proposition 3.5. Under Assumptions ^41-^43, for the independence 
Jeffreys prior ir J1 given by (2.8), and the Jeffreys-rule prior tt J2 given by 
(2.9), there are functions 7nf*(£fc), i = 1,2, k = 1, . . . , r, such that 

(3-20) * J< (0 < IW'teO. 

fc=i 

where, under added Assumptions ^44-^48 and for k = 1, . . . ,r, we have as 
tk - 0, 

(3.21) ^ (€fc)(X J^L (1 + o( i )) 

, v -J 1 ' i/i = 1 or (i = 2 and X fc = 1), 

^.^j afc "\3/2, i/i = 2 ondXfc^l. 

4/so, 71^ is integrable at infinity. 
For the proof see Appendix A. 0.5. 

3.4. Results on posterior propriety. In this section we investigate the 
propriety of the formal posterior associated with a general prior of the form 
(2.4), in the scenario described throughout the previous sections. We end 
with a result that specifically addresses the case of both Jeffreys priors and 
the reference prior. 

Theorem 3.6. Under the set of Assumptions .A1-.48, the posterior as- 
sociated with the general prior (2.4) — where 7r(£) is either ir R , ir J1 orir J2 — 
is proper as long as 

(3.23) o>l/2. 



For the proof see Appendix A. 0.6. 

If we restrict attention to the instances of the general prior that are of 
particular interest, then we have the following: 
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Corollary 3.7. Under Assumptions .Al-^48, the reference, the inde- 
pendence Jeffreys and the Jeffreys-rule priors yield proper posteriors. 

Proof. Just recall that a = 1 in the case of the reference and indepen- 
dence Jeffreys prior, and that for the Jeffreys-rule a = 1 + q/2, where by 
assumption q = 1 . □ 

3.5. Discussion and generalizations. It is interesting to understand what 
makes the multidimensional problem different from the unidimensional one, the 
case covered in Berger, De Oliveira and Sanso (2001). As we mentioned in 
the Introduction, the reason why posterior propriety is difficult to achieve in 
the unidimensional case has to do with the behavior of the integrated like- 
lihood near the origin. For example, if p = 1 and X = 1, we have, as £ — > 0, 
L (£|y) = 0([u(^)] 1 ~ a ), which is independent of the sample size. Roughly 
speaking, the first and last factors of (2.11) produce powers of z/(£) depend- 
ing on the sample size that essentially cancel. 

The key formula in the multidimensional case is 



=i 



ni s *i 



\ n (k) 

k=l 



where = Yii^k n i- Considering (3.10), it is clear that, as — ► 0, |S| 1 I 2 oc 

[n^ fe \^i\- n ^ /2 HCk)~ n/2 ^k) n ^ /2 (l+o(l)). The first factor involving u(£ k ) 
will again essentially cancel, but we still keep the second, which explains the 
different behaviors. For small the prior is not that relevant in determining 
posterior propriety in the multidimensional case, whereas in the unidimen- 
sional case it is of capital importance. 

Another point of interest is the range of applicability of the results of 
this section. We would argue that Assumptions A2 and A3 are mathemat- 
ical conveniences and, as we have stated before, Assumptions A&-A7 are 
virtually always satisfied in the context of the usual families of correlation 
functions. The (partial) separability Assumption .4.1 is indeed restrictive, 
and more research is needed before one feels confident about using the pri- 
ors derived in Section 2 in circumstances where Assumption ^41 is not valid. 

Nevertheless, Assumption Al is not uncommon in practice. Apart from 
the field of computer model validation, where it is frequently assumed, var- 
ious other examples can be found in geology applications, where the two 
lateral dimensions are separated from the vertical dimension, and in space- 
time models, where time is separated from the position vector [consider, e.g., 
Fuentes (2003), Short and Carlin (2003) and references therein]. 

We would also like to stress the theoretical interest of the results of this 
section. As we pointed out, Berger, De Oliveira and Sanso (2001) revealed 
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very unusual and surprising facts unknown to the spatial statistics commu- 
nity at the time. There was no apparent reason to expect a different scenario 
in any sensible extension of the setting treated in that paper. The present ar- 
ticle seems to indicate that, possibly, Berger, De Oliveira and Sanso (2001) 
deal with an exceptional case, and that, in general, the "standard" type of 
behavior is to be expected. 

There are obvious potential extensions to the present work. One would 
like to treat the smoothness parameters as unknowns and also to consider 
variations on the ordering of the parameter vector in the reference prior. 
These are technically challenging tasks that constitute work in progress. 

4. Computation. There is considerable need in the area of computer 
model validation for default prior specifications and also for efficient com- 
putational schemes. In this section we address several computing issues, and 
in particular that of Bayesian learning in the presence of the objective pri- 
ors derived in this paper. We present an example, taken from the field of 
computer model validation, to illustrate some of the proposed solutions. 

We now return to the general setting of Section 2, and in particular, unless 
otherwise noted, £ represents a general r-dimensional vector involved in the 
parametric formulation of the correlation function. 

4.1. General remarks. Evaluating any of the objective priors at one par- 
ticular value of £ is a computationally intensive task. The reference prior is 
the most computationally intensive of all three, followed by the independence 
Jeffreys prior and by the Jeffreys-rule prior. In order to convince oneself of 
that, it suffices to consider the calculations involved in computing each of the 
entries of the matrices (2.7) and (2.10). Since = P, to compute the 
reference prior at a particular value of £ one has to calculate the projection 
matrix P, which is not necessary in any of the other priors. Additionally, 
each Wfc requires one more matrix product (of two n x n matrices) than 
Ufc. The difference between the Jeffreys-rule and independence Jeffreys is 
only in the computation of |X'X! -1 X|. For similar reasons, it is clear that 
the likelihood function (2.3) is computationally less expensive than either of 
the integrated likelihoods given by (2.5) and (2.11). These observations are 
particularly relevant in the context of Markov chain Monte Carlo (MCMC) 
computations, addressed in Section 4.4, where the prior and the likelihood 
have to be evaluated a very large number of times. 

One should note that some of the assumptions of Section 3 have con- 
siderable potential impact on the computational side of the problem. When 
satisfied, the separability Assumption .Al paired with the Cartesian product 
structure of the design set Assumption A2, which implies (3.1), can be ex- 
ploited in order to tremendously simplify and speed up computation. Indeed, 
the most expensive (and possibly numerically unstable) calculations of this 



14 



R. PAULO 



problem are certainly computing the inverse of the correlation matrix and 
its determinant. Facts 2 and 3 of Appendix A. 0.1 essentially state that, in 
this case, we only have to compute the inverse and the determinant of each 
of the p matrices of dimension x n^, and not for the nxn correlation 
matrix 51. Even the Cholesky decomposition of S can be obtained from that 
of the Sfe, as it is easy to see. This allows for much more freedom on the 
number of points at which one observes the stochastic process, as explored 
in Bayarri et al. (2002) when dealing with functional output of a computer 
model. 

4.2. Maximum likelihood estimates. The calculation of maximum likeli- 
hood (ML) estimates turns out to be useful in several ways. 

Indeed, in Section 4.3 we are going to use maximum likelihood estimates in 
devising a mechanism that produces proper flat priors. Also, these estimates 
can be useful in devising sampling schemes that require little input from the 
user, as we will see in Section 4.4. Last, it is often the case that there is not 
enough information in the data to learn about all the parameters involved 
in the correlation structure, especially about the parameters that control 
geometric properties of the process, the so-called roughness parameters. We 
have observed that as long as one fixes those parameters at "sensible" values, 
inference is fairly insensitive to the particular value chosen. Data-dependent 
choices, and in particular maximum likelihood estimates, have produced 
good practical results in our applied work. 

The question of which kind of estimate one should compute is relevant, as 
here we have at our disposal explicit formulas for three distinct (integrated) 
likelihood functions. 

There is considerable empirical evidence supporting the fact that, in gen- 
eral, maximum likelihood estimates derived from integrated likelihoods tend 
to be more stable and also more meaningful, so that one would in princi- 
ple discard the idea of maximizing the joint likelihood (2.3). Also, these 
estimates are not as useful in the context of devising sampling schemes. 

Maximizing the integrated likelihood Z/(<r 2 ,£|y) in (2.5) has some ad- 
vantages over maximizing i/(£|y) in (2.11). First of all, it gives rise to 
considerably simpler formulas for the gradient and the associated expected 
information matrix. Indeed, it is easy to see from inspection of the proof of 
Proposition 2.1 that 
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and that the associated Fisher information matrix is 
fn-q 



(4.1) I{a< 



,0 = 5 



l l 

-^trWi -^trW 2 



trWf 



trWiW 2 



V 



^trW r 
o 

trWiW r 



trW r 2 / 



This allows for a very simple optimization algorithm to be used in computing 
the associated estimate, namely Fisher's scoring method, a variant of the 
Newton-Raphson method that results from approximating the Hessian of 
the logarithm of integrated likelihood by its expected value. To be more 
precise, the (s + l)st iterate of the numerical method is given by 



y'Qy 



n 



^ + x[i(e ) )r l dlllLl d f a2 



= (o- 2 )(") 



where /(£) results from (4.1) by dropping the first row and column. The 
quantity A is the step size of the algorithm. 

There are some drawbacks to this simple numerical method. First, it is 
possible that, initially, some iterates happen to lie outside the parameter 



space. We avoid this problem by simply saying that = Qu' whenever 

the kih component of happens to not belong to the corresponding 

parameter space. Also, it is well known that Newton-Raphson- type methods 
are quite sensitive to the starting points, sometimes becoming trapped in 
local maxima and sometimes simply not converging. This unfortunately is 
not easy to solve. Some experimentation and tuning of A are required in 
order to assure convergence. 

Getting an estimate of £ by maximizing (2.11) is not as attractive be- 
cause of the fact that the formulas are computationally more involved. For 
example, there does not seem to be a closed-form expression for the associ- 
ated expected information matrix. Also, in the examples we have dealt with 
involving the power exponential family of correlation functions, we did not 
notice any improvements over the simpler method. The availability of an 
estimate of the variance will also be useful in the next section. 



4.3. Proper flat priors. When trying to produce so-called noninformative 
Bayesian analyses, people often consider placing proper flat priors on the 
parameters as an alternative to using priors derived using formal methods. 
The basic idea is to specify priors that are relatively flat in the region of the 
parametric space where most of the posterior mass accumulates. 
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One way of reproducing this practice is by describing it as an empirical 
Bayes method. We do so by placing independent exponential priors centered 
at a multiple of the ML estimates on the precision (reciprocal of variance) 
and on the components of vector £. The constant that multiplies the ML 
estimate in order to get the mean of the prior can be determined by ex- 
perimentation, making sure that the effect of the prior on the posterior is 
relatively small, in other words, making sure that the prior is relatively flat 
in the region of the parametric space where the likelihood has most of its 
mass. For an example, consider Section 4.5. 

In Section 5 we will compare the frequentist properties of this approach to 
objective Bayesian analysis to the use of the formal priors we derive in this 
paper. Apart from philosophical reasons, this comparison is also relevant 
because in the present case the empirical Bayes method is computationally 
much simpler than any of the alternatives. 

4.4. Sampling from the posterior. No matter which prior specification 
one uses — any of the objective priors or the empirical Bayes method — one 
can sample exactly from the full conditional of 6 and of a 2 : these are, re- 
spectively Gaussian and inverse-gamma. 

Then, one has the choice between sampling from the marginal posterior 
[£|y] or from the full conditional [£|y, o~ 2 ,9]. We adopt this latter strategy 
since the integrated likelihood is computationally more expensive than the 
full likelihood and, based on the experiments we conducted, there are no 
apparent gains in terms of mixing in adopting the other strategy. 

At this point we have also the option of drawing £ as a block or not. 
Notice that as long as one updates one of the components of £ one has 
basically to recompute the likelihood, as it does not factor in any useful 
way. From that perspective, it is more efficient therefore to sample £ as a 
block. Additionally, it is reasonable to expect some serial correlation between 
the £fc in the MCMC output, and sampling £ as a block would at least reduce 
it. 

The question is, can we get a proposal, in a more or less automatic fashion, 
that allows for this? The ML estimate and the expected information matrix 
we described in Section 4.2 are useful in accomplishing this goal. Although 
the resulting sampling scheme is certainly not new and has been widely used, 
it seems to work well in the examples we have examined so far. 

The idea is to consider a reparameterization £ = g(<r 2 ,£) = (<7i(c 2 ),g2(£)) 
chosen so that the new parameter space is unconstrained, that is, R r+1 . For 
this alternative parameterization, as long as g is sufficiently regular, it is 
possible to calculate both the marginal ML estimate £ and the associated 
expected information evaluated at the ML estimate given that we 

can compute these quantities for the original parameterization. Consider 
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the partition of I^(C) given by (u is a scalar, v is an r x 1 vector) 
(4-2) J C (C) 



u v 
v A 



and define V = A — vv'/it. We perform a Metropolis step in the alternative 
parameterization with a i-density proposal centered at the previous itera- 
tion, with d degrees of freedom and scale matrix c 2 V. To implement this 
MCMC method one only has to specify d and c. The number of degrees of 
freedom is not an issue; one can even use a Gaussian instead of a t. As for 
c, one can always experiment with a few values, and a reasonable starting 
guess is c«2.4/- v /r [Gelman, Carlin, Stern and Rubin (1995)]. 

We next give the precise details of the algorithm. Assuming that the chain 
is currently at state (0 (old) , cr 2 old) , £ (old) ) , the next state (0 (new) , cr 2 new) , £ (new) ) 



L X'£ 



(new) ' 

X'E^X)- 1 ), where 



is determined by the following steps: 
Step 1. Draw (ncw) ~ N((X / S" 1 X 
£ = £(£ (old) ). 

Step 2. Draw a? w) ~ r^ 1 (n/2 + a - l,y'£ _1 y/2 + r a z), where £ = 
S(^/ j d s) and by convention a = 2 in the case of the empirical Bayes method. 
Recall that in the case of the reference and independence Jeffreys priors 
a = 1, whereas in the Jeffreys-rule a = 1 + q/2. Also, r a 2 is the rate of the 
exponential prior in the empirical Bayes case and should be set to zero 
otherwise. 

Step 3. Draw S ~ t( d) (g 2 {£( i d) ),c 2 V) and let ^ = g2 1 (^); then 



> (new) 



£(old)> 



w.p. p, 
w.p. 1 



where 



and 



min < 1 



N (y | X0 (new) , af ncw) £ (£ (new ) ) ) 7T (£ (new) ) q (£ (old) | £ ( 



new) > 



N (y | X0 (ncw) , CJ (new) S (£ (old) ) ) 7T (^ (old) ) Q (C(new) I ^ (old) ] 



?(»!!/) =i(d)(g2 0c)|g 2 (y),c 2 V)^| 



4.5. An example. For illustrative purposes, we are going to consider a 
simplified version of a problem analyzed in detail in Bayarri et al. (2002). 
In that paper, the authors consider the analysis of a computer model that 
simulates the crash of prototype vehicles against a barrier, recording the 
velocity curve from the point of impact until the vehicle stops. For our 
purposes, we will imagine that if we input an impact velocity and an instant 
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in time, the computer model will return the velocity of the vehicle at that 
point in time after impact. 

The data consist of the output of the computer model at 19 values of 
t and 9 initial velocities v, which corresponds to a 171-point design set 
that follows a Cartesian product. If we subtract from each of the individual 
curves the initial velocity, what happens is that all the curves will start at 
zero and decay at a rate roughly proportional to the initial velocity. For 
this transformed data, which is plotted in Figure 1, it seems reasonable to 
consider the mean function EY (v, t) = vtO, which in the notation of Section 2 
translates into q = 1 and tp(v,t) = vt. In particular, this problem satisfies 
Assumptions A2 and .A3. 

In terms of the correlation function, we will assume a two-dimensional 
separable power exponential function, with the roughness parameters fixed 
at 2, and the parameterization 

(4.3) c((t 1 ,v 1 ),(t 2 ,v 2 )) =exp(-/3 t |t 1 - t 2 \ 2 )exp(-(3 v \v 1 -v 2 \ 2 ). 

Figure 2 shows estimates of the posterior distribution associated with each 
of the priors that we have introduced in the paper. The empirical Bayes cor- 
responds to centering the exponential priors at 10 times the computed ML 
estimates, and the dotted lines on the figure correspond to those densities. 
The sampling mechanism was implemented exactly as detailed above, and 
we used a t density with three degrees of freedom and c = 1.7. The trans- 
formation g that we used was the logarithmic one. The acceptance rate was 
roughly 0.27 in all cases, and the results correspond to 50,000 iterations, 
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minus 500 that were discarded as burn- in. The chains seem to reach station- 
arity very quickly, and 50,000 iterations is certainly more than we actually 
need for reliable inference. 

For these data the answers are quite robust with respect to the type of 
default prior chosen. In the next section we study finer details of the methods 
proposed in the paper, in particular by studying the frequentist properties 
of the resulting Bayesian procedures. 

5. Comparison of the priors. When faced with more than one valid de- 
fault prior specification strategy, it is often argued that one way to distin- 
guish among these is by studying the frequentist properties of the resulting 
Bayesian inferential procedures. This usually takes the form of computing 
the frequentist coverage of the (1 — 7) x 100% equal-tailed Bayesian credi- 
ble interval for one of the parameters of interest. The closer to the nominal 
level is this frequentist coverage, the "better" is the prior. For discussion 
and further references, see Kass and Wasserman (1996). 

Our study was conducted in the context of the power exponential cor- 
relation function with p = 2, parameterized as in (4.3), with the roughness 
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Fig. 2. Smoothed histograms of samples drawn from the posterior distribution associated 
with each of the priors. Dotted lines correspond to exponential priors of empirical Bayes, 
and the triangles indicate the marginal likelihood estimates. 
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parameters (a± and 0:2) treated as known. For a 5 x 5 equally spaced grid in 
[0, 1] x [0, 1], we considered two options for the mean structure: an unknown 
constant term that we fixed at 6 = 1 and E[y(x)|r/] = 6x±, where again 9 
was fixed at 1. Several choices for the other parameters were considered, 
and we simulated 3000 draws from the ensuing model. Each Markov chain 
consisted of 15,000 iterations, the first 100 being discarded as burn-in. We 
calculated 95% credible intervals for a 2 , 6 and f3±, and partial results are 
summarized in Table 2. Along with the estimate of the coverage probability, 
we present also an estimate of the expected length of the resulting credible 
interval and the standard deviation associated with the estimate. 

A clear conclusion to extract from the results of these experiments is 
the comparatively poor behavior of the empirical Bayes method, which re- 
sulted from centering the priors at 10 times the computed ML estimate. 
This conclusion is particularly important since, as we have already pointed 
out, whenever a formal objective prior is not available, practitioners tend to 
resort to similar strategies to produce so-called "vague" or "diffuse" priors. 
On the basis of this study, one might argue against this type of approach to 
objective Bayesian analysis. 

It is also possible to argue against the use of the Jeffreys-rule prior, and 
the reason for its inferior behavior has to do with the power a = 1 + q/2 in 
its formulation. This was already reported in Berger, De Oliveira and Sanso 
(2001): this type of prior is known to add spurious degrees of freedom to 
uncertainty statements. 

The present study is not quite decisive about how the reference prior and 
the independence Jeffreys prior compare, and one may conclude that they 
present virtually equivalent performance. On the basis that the reference 
prior is computationally more demanding, we recommend the independence 
Jeffreys prior as a default prior for the problem at hand. 



A. 0.1. Auxiliary facts. Throughout the Appendix we will repeatedly use 
the following results. They are standard propositions whose proof can easily 
be found [e.g., Harville (1997) and Tong (1990)]. 

Fact 1. If for i = 1, . .. , r the product AjBj is possible, then we have 



Fact 2. If the matrices Aj, i = 1, . . . ,p, are invertible, then ®[ =1 Aj is 
invertible and one has 
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Table 2 

Coverage probability of 95% credible intervals 



Coverage prob. Expected length Std. dev. 



Interval for a 2 ; parameter vector 


fixed at 


(1.5,3.2,3 


6, 1.5, 1.7); Iky 


= 1. 




Empirical Bayes 


0.857 




2.904 





005 


Reference prior 


0.956 




11.354 





003 


Independence Jeffreys prior 


0.954 




n.oyo 


n 
u 


UUo 


Jeffreys-rule prior 


0.946 




6.334 





003 


Interval for 8; parameter vector fixed at 1 


1.5, 3.2, 3.6 


, 1.5, 1.7); EY = 


- 6 = 


1. 


Empirical Bayes 


0.805 




3.156 





007 


Reference prior 


0.952 




7.434 





005 


Independence Jeffreys prior 


0.956 




7.410 





005 


Jeffreys-rule prior 


0.902 




5.265 





012 


Interval for 6; parameter vector fixed at 1 


1.5,0.2,0.6 


,1.5,1.7); EY = 


--6x 1 


= El. 


Empirical Bayes 


0.840 




1.077 





007 


Reference prior 


0.948 




0.950 





003 


Independence Jeffreys prior 


0.955 




0.931 





004 


Jeffreys-rule prior 


0.928 




1.027 





005 


Interval for f3\\ parameter vector 


fixed at 


(1.5,0.2,0 


6,1.5,1.7); EY 


= 1. 




Empirical Bayes 


0.826 




1.145 





007 


Reference prior 


0.946 




0.901 





004 


Independence Jeffreys prior 


0.954 




0.890 





004 


Jeffreys-rule prior 


0.919 




1.059 





005 


Interval for f3\\ parameter vector 


fixed at 


(1.5,0.2,0 


6,1.0,1.2); EY 


= 1. 




Empirical Bayes 


0.797 




1.815 





007 


Reference prior 


0.948 




1.042 





004 


Independence Jeffreys prior 


0.948 




0.996 





004 


Jeffreys-rule prior 


0.916 




1.255 





005 



The order of the fixed parameter vector is (a , 01,02, ai, 0:2)- 

Fact 3. If the matrices Aj, i = 1, . . . ,p, are of dimension n, x rii, tl 
with =Uk^i n k, 



0A, 

i=i 



r 



i=l 



Fact 4. If A and B are m x n matrices, and C is p x q, one has 
C®(A + B)=C0A + C0B, (A + B)®C = A®C + B®C. 

Fact 5. tr(A <g> B) = (tr A)(trB). 

Fact 6. (A (g> B)' = A' <g> B'. 
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Fact 7. If A is a function of 9 but B is not, then 

d 

®B, — [B®A] = B< 
69 



^[AOB] 



«90 A 



50 



Fact 8. If X = S((9) is a positive definite matrix whose entries are 
differentiable with respect to 9, then with 5] = -JjjS, 



00 



log|S| =tr[S- 1 S] 



— S -1 
90 



Fact 9. Let X ~ N(/x, XI) and let A and B be symmetric matrices. Then 

EX' AX = tr AS + fi'Afi, 
Cov(X' AX, X'BX) = 2 tr ASBS + V ASB/x. 

Fact 10. Let A be a nonsingular matrix. Then |A + ll'| = |A|(1 + 
l'A -1 !). If, furthermore, l'A 1 ! 7^ —1, then A + 11' is nonsingular and 



(A + 11') 



A-^l'A- 1 
1 + 1'A" 1 ! 



A.0.2. Proofs of Section 2. 



Proof of Proposition 2.1. Define the shorthand I 1 = log L 1 (a 2 , £|y) . 
Berger, De Oliveira and Sanso (2001) prove that dl 1 /da 2 = {S\ -ES , |)/(2cr 4 ) 
and <9^/d& = (R\ -Ei?i)/(2a 2 ), where 5|/a 2 ~ xiL, and #| is a quadratic 

form on PY ~ N(0, cr 2 PE), associated with the matrix E^E'E -1 . The 
result follows from Fact 9 and elementary properties of the determinant 
function. □ 



Proof of Proposition 2.2. Suppose Y|0,£~ N(X0,E), where E = 
E(£), and let t = logL(0,£|y) be the associated log-likelihood function. 
Standard results of matrix differentiation plus Facts 8 and 9 of Appendix A. 0.1 
allow one to conclude that 

dl/dO = (X'E^y - EX'E _1 y) /2 
and dljdii = {R\ - ERQ/2, where R\ is a quadratic form on (Y - X0) ~ 

N(0,E) associated with the matrix XI^s's^ 1 . The result follows easily 
from elementary properties of the determinant function. □ 



A.0.3. Proofs of Section 3. 
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A. 0.4. Proofs of Section 3.2. 

Proof of Lemma 3.1. We start out by proving (3.3). Since E = 
<S>l=i ^k, from Fact 2 of Appendix A. 0.1 it follows that E -1 = <g)fc =1 E^ 1 . 
Then by Fact 6 of Appendix A.0.1, repeated application of Fact 1 and by 
Assumption A2, one has X'S _1 X = (££)£ =1 X^E^Xfc, which in conjunction 
with Fact 3 establishes the result. This last expression, Fact 2 and repeated 
application of Fact 1 show (3.2). 

□ 

Proof of Lemma 3.2. Formulas (3.9)-(3.11) are shown in Berger, De 
Oliveira and Sanso (2001) to follow essentially from (3.4) and from the fact 
that lim^_ + o z/ (C) = 0, a simple consequence of (3.4) and /o(0|£) = 0. The as- 
sumption l^ fc D A T 1 l nfe ^ and (3.5) assure that the matrices are well defined 
and that the expansions are meaningful. 

We next prove (3.12). For simplicity, we drop the subscript k in the 
sequel. Suppose then that X = 1, and write 5] = A + 11'. Using Fact 10 
and simple manipulations, Berger, De Oliveira and Sanso (2001) prove, in 
a more general context, that 1(1'5] _1 1)~ 1 1' = l(l'A _1 l)- 1 l / + 11'. Pre- 
and postmultiplying this last equation by S -1 = A" 1 — (A~ 1 11'A~ 1 )/(1 + 
l'A" 1 !) and simplifying yields, after some algebra, 

A-lll'A-l 

E^lCl'S- 1 !)- 1 !^- 1 



(l + l'A-ilJl/A" 1 !" 

Substituting A = y(£)(D + o(l)) in the right-hand side yields the first part 
of (3.12). The second part follows directly from formula = ST^X^X^ x 
E7 Xfc) _1 XlEr and expansion (3.9). Equation (3.5) assures that is 
well defined, and it is also easy to show that ^ 0. □ 

Proof of Proposition 3.3. Formula (3.13) is a simple consequence 
of Lemma 3.1 and Fact 3 of Appendix A.0.1. The continuity of L 1 ^ | y) as a 
function of £ follows from the continuity of p as a function of the parameter 
and the product form of the correlation function. We will use throughout 
this proof the facts that, as £ — ► 0, v{£) — ► 0, and that, as — ► oo, E^ — ► I Hk . 
These are simple consequences of the assumptions of Lemma 3.2. 

Using (3.13) and the expansions of Lemma 3.2, it is possible to check that 
in the circumstances of part (a) of Proposition 3.3 we have 

L^my) oc n K6o (nfc - 3+2a)/2 n ^6) i/2 (i+ff(0) 

keB k&AC\B 

-(n-3+2a)/2 

c- n k&)c*' 

keAnB 
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where C = <gf fc= i C fc , C* = ®£ =1 C* 



with 



c »={i G " 



ifkeB 
if fee B 



and 




F fc , 

x fe( X fc X fc) lx * 



if keAnB, 

if k £ An B , 
ii kE B. 



We next argue that the quantity within the braces can be bounded between 
two positive constants as long as k G B, are sufficiently small, in which 
case the result follows immediately. 

If A 7^ 0, then that quantity converges to y'Cy, that we claim to be 
positive. To see that, recall that the Kronecker product of positive definite 
matrices is still a positive definite matrix. As a consequence, it suffices to 
show that Gfc is positive definite. This follows from (3.9) and the fact that 
Xl/T 1 is positive definite (recall that v is positive). 

If A = 0, then it suffices to show that y'(C — C*)y > 0. To see that, 
note that it can be checked that C - C* = C[I n - X(X'CX)^X'C]. The 
quantity between brackets in the last expression is a projection matrix, and 
hence [cf. Harville (1997), page 262] 

C[I n -X(X'CX)" 1 X / C] = [I n -X(X'CX)" 1 X'C]'C[I n -X(X'CX)^ 1 X'C]. 

The result follows since, from what we have seen above, C is positive definite. 

The proof of part (b) follows essentially from the same type of arguments. 
Let C and C* be as before but with I nfc and Xfe(X^.Xfc) _1 X^, replaced by 
S^" 1 and <&fc, respectively. It is possible to check that 



If A ^ , then the quantity within the braces in the last factor converges to 
y'Cy > 0. In this case, h can be taken to be the second-to-last factor. 

If A = 0, then we must show that C — C* is positive definite in the set 
{^ifc < ik < -^fci k S B}. This follows from an argument similar to the one 
used for part (a), and therefore will be omitted. Here h can be chosen to be 
the product of the last and second-to-last factors. □ 



oc n K6) K - 3+2a)/2 n ^k) i/2 (i+ g m,k€B})) 




x n{i^ fc r nw/2 ( x ^ fc - ix fc )" i/2 } 



k£B 



X 
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A. 0.5. Proofs of Section 3.3. 

Proof of Proposition 3.4. Using Facts 1 and 7 of Appendix A. 0.1 
along with (3.2), it is clear that 

W fc = I ni ® • • • <g) SfcEfc 1 ® . . . ® I nr - <g> • • • <g> tl fc * fc ® • • • ® S r * r . 

Next, note that Sfc3?fc is a projection matrix, so that it is idempotent, and 
its rank, and therefore its trace, is rankX nfc = 1. Also, ^ 5] = $> k . 

These facts, Fact 5 of Appendix A.0.1, some algebra and the known fact 
that tr AB = trBA whenever the products are possible, allows one to show 
that 

trW fc = n^trt^ 1 -trS fc * fc , 

tr W 2 k = n [k) tr(S fe Sfc 1 ) 2 + tr(± k $ k ) 2 - 2tvt k ^ 1 t k ^ k , 
trWjWj = Y[ m x trSjErHrEjEj 1 - tr±, i $> i tr±, j $> j , 

the key point to note being that only trWjW,- depends on more than one 
parameter. Therefore, we can write (2.7) as 



(A.l) 




where T is (p — 1) x (p — 1) and does not depend on £ r , u is (p — 1) x 1 and 
9?Hr) = trWj?, which depends on £ r only. As a consequence of the above 
block format, we have [cf. Harville (1997), page 188] 

\Ir(£)\ = \T\(g?(£ r )-u;'T- l u) < \T\g^ r ), 

the last step following from the fact that T _1 is positive definite. If we 
repeat this procedure p times, we end up proving (3.18) by defining 7r k (^ k ) = 
bf(6)] 1/2 = [trWl]V2. 

We now study n k (^ k ) as — ► 0. It is easy to verify that T) k G k is idempo- 
tent and that tr T) k G k = n k — l. Also, it is possible to show that D k G k T) k H k = 
DfcHfc and that trD^H^ = 1. Note also that, as a consequence of the as- 
sumptions we have, as — ► 0, 

(A.2) £ fc = i/(&)D fc (l + o(l)). 

These properties and simple expansions along with (3.9) and (3.12) show (3.19). 
Assumption .48 is capital to this result. 

Since as — > oo, H k — > I nk , the integrability of ir k (£, k ) at infinity follows 
from Assumption A7. □ 
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Proof of Proposition 3.5. This proof essentially mimics the one 
of Proposition 3.4. Using the definition of and Facts 1 and 7 of Ap- 
pendix A. 0.1, it is easy to check that 

U fc = I ni <g> • • • <g> SfcS^ 1 g> • • • ® I nr . 

Next, essentially the same type of arguments as before allows one to write 
trUfc =n( fc ) trEfeS^ 1 , 
trl^ = n (fc) tr(S fc Sfc 1 ) 2 , 
trUiUj = JJ raj x trSiE^trEj-Sj 1 . 

Again, only trUjU,- depends on more than one parameter, and the con- 
struction carried out in the proof of Proposition 3.4 can be repeated to 
show (3.20), where now vrf (6) oc (trUfJ 1 / 2 and Trf (&) oc Trf (&) IX^^X*. 

To obtain the behavior of these functions as — > 0, it suffices to use 
expansions (3.9), (3.11) and (A. 2). It is also necessary to recall that D^G^ 
is idempotent and that its trace is — 1 (and hence Assumption A8). 

The integrability at infinity of 'K% Ji {') follows from the fact that, as —> oo, 
5]^ — > I nfe , and Assumption A7. □ 

A. 0.6. Proofs of Section 3.4. 

PROOF of Theorem 3.6. We will determine conditions under which 

0< / L / (^|y)7r(Od6---^r<oo 
Jo 

holds. To simplify the notation, we will write /(£) = L (£|y). The function 
7r(£) will be associated either with the reference prior or with one of the 
Jeffreys priors we considered, but we will carry out the calculations for a 
general exponent a in the integrated likelihood. In all of these instances of it, 
there are functions 7Tfc(£fc), k = 1, .. .,r, such that 7r(£) < Ilfe=i n k(^k), and 
those have essentially the same behavior across the different choices for 7r(£), 
so that a common proof can be sought — compare Propositions 3.4 and 3.5. 
Note that 

POO rOO 

(A.3) / «)rf?i-^< /(0ti(6)#i •••*>(&.)<*&., 
JO Jo 

and therefore we will have posterior propriety if the right-hand side is finite. 

The right-hand side of (A.3) can be partitioned into a sum of integrals, 
each of which has different regions of integration. If the region is of the form 
[Si, +oo[ x • • • x [5 r , +oo[, then there are no issues to address since the tt^ are 
integrable at infinity and in this set the integrated likelihood is bounded. 
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Therefore, we need only consider regions of the following type: let B be 
a nonempty but otherwise arbitrary subset of { 1 , . . . , n} , and define said 
regions by: 

(i) for k € B, < < 5k, where 5k can be chosen arbitrarily small; for 
k £ B, £,k> Mk, where Mk can be assumed arbitrarily large; 

(ii) for k £ B, < ^ < 5k, where 5k can be chosen arbitrarily small; for 
k £ B, 5k < £fc < Mk, where Mk is finite but otherwise arbitrary. 

Let us consider (i) first. Combining expansions (3.14), (3.19) and (3.21), 
it is easy to see that 

k=l kGAnB 

x n w(Ck)ntk) {nk - 5+2a)/2 

fceAnB 

where ctfc is defined in Proposition 3.5 (a^ = 1 in the case of the reference 
prior). 

When we look at situation (ii), formula (3.15) and the same expansions as 
before for the priors yield an expression formally identical to the one above 
multiplied by h(^k, k £ B). 

Since the functions iik are integrable at infinity, and for small enough 
£> ^(0 < 1) ^ i s clear that posterior propriety will be achieved for all three 
priors whenever the function |z/(<!;)|i/(£)( nfe-5+2a )/ 2 is integrable at the origin 
for every k. This reduces to (3.23) since v = o(l) and minn^ > 2. □ 

Acknowledgment. The author would like to thank Professor James O. 
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